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Abstract — We study polarization for nonblnary channels with 
input alphabet of size q = 2"^ ,r — 2,3,.... Using Arikan's 
polarizing kernel H2, we prove that the virtual channels that 
arise in the process of polarization converge to g-ary channels 
with capacity 1, 2, . . . , r bits, and that the total transmission rate 
approaches the synunetric capacity of the channel. This leads to 
an explicit transmission scheme for g-ary channels. The error 
probability of decoding using successive cancellation behaves as 
exp(— A*"*), where A*' is the code length and a is any constant 
less than 0.5. 

I. Introduction 

Polarization is a new concept in information theory discov- 
ered in the context of capacity-achieving families of codes 
for symmetric memoryless channels and later generalized 
to source coding, multi-user channels and other problems. 
Polarization was first described by Arikan [1] who constructed 
binary codes that achieve capacity of symmetric memoryless 
channels (and "symmetric capacity" of general binary-input 
channels). The main idea of [1] is to combine the bits of the 
source sequence using repeated application of the "polarization 

kernel" H2 — ( , , ) . The resulting linear code of length 
N = 2" has the generator matrix which forms a submatrix 
of Gn ~ BH2"', where _B is a permutation matrix. The 
choice of the rows of Gat is governed by the polarization of 
virtual channels for individual bits that arise in the process 
of channel combining and splitting. Namely, the data bits 
are written in the coordinates that correspond to near-perfect 
channels while the other bits are fixed to some values known 
to both the transmitter and the decoder It was shown later 
that polarization on binary channels can be achieved using 
a variety of other kernels: in particular, any m x m matrix 
whose columns cannot be arranged to form an upper triangular 
matrix, achieves the desired polarization [2]. 

A study of polar codes for channels with nonbinary input 
was undertaken by §a§oglu et al. [3], [4] and Mori and Tanaka 
[5]. For prime q, it suffices to take the kernel H2, while for 
nonprime alphabets, the kernel is time-varying and not explicit. 
Namely, for prime q, [3] showed that there exist permutations 
of the input alphabet such that the virtual channels for indi- 
vidual g-ary symbols become either fully noisy or perfect, and 
the proportion of perfect channels approaches the symmetric 
capacity, in analogy with the results for binary codes in [1]. 
At the same time, [3] remarks that the transmission scheme 
that uses the kernel H2 with modulo-g addition for composite 
q does not necessarily lead to the polarization of the channels 



to the two extremes. Rather, they show that there exists a 
sequence of permutations of the input alphabet such that 
when they are combined with H2 , the virtual channels for the 
transmitted symbols become either nearly perfect or nearly 
useless. 

The authors of [3] suggest several alternatives to the kernel 
H2 that rely on randomized permutations or, in the case of 
q = 2^ , on multilevel schemes that implement polar coding 
for each of the bits of the symbol independently, combining 
them in the decoding procedure; see esp. [4]. 

In this paper we study polarization for channels with input 
alphabet of size q = 2^,r = 2,3,.... Suppose that the 
channel is given by a stochastic matrix VF(2/|a;) where x e 
X,y G y,X ~ {0, 1, . . . ,q — 1}, and 3^ is a finite alphabet. 
Assuming that the channel combining is performed using the 
kernel H2 with addition modulo q, we establish results about 
the polarization of channels for individual symbols. It turns out 
that virtual channels for the transmitted symbols converge to 
one of r+1 extremal configurations in which j out of r bits are 
transmitted near-perfectly while the remaining r — j bits carry 
almost no information. Moreover, the good bits are always 
aligned to the right of the transmitted r-block, and no other 
situations arise in the limit. Thus, the extremal configurations 
for information rates that arise as a result of polarization are 
easily characterized: they form an upper-triangular matrix as 
described in Sect. II-B (see also Figs. 1,2 in the final section 
of the paper). This characterization also constitutes the main 
difference of our results from the multilevel scheme in [4]: 
there, the set of extremal configurations can in principle have 
cardinality 2'' which complicates the code construction. 

Another related work is the paper by Abbe and Telatar 
[6]. In it, the authors observed multilevel polarization in a 
somewhat different context. The main result of their paper 
provides a characterization of extremal points of the region of 
attainable rates when polar codes are used for each of the r 
users of a multiple-access channel. Namely, as shown in [6] 
(see also [7]), these points form a subset in the set of vertices 
of a matroid on the set of r users. [6] also remarks that these 
results translate directly to transmission over a g-ary DMC, 
showing that the rate polarizes to many levels. To explain the 
difference between [6] and our work we note that transmission 
over the multiple-access channel in [6] is set up in such a way 
that, once applied to the DMC, it corresponds to encoding 
each bit of the q-ary symbol by its own polar code (we again 
assume that q = 2''). In other words, the polarization kernel 



employed is a linear operator G ^ Ir ® i?2- Thus, the group 
acting on X is Fj^ = Z2 x • • • x Z2 rather than the cyclic 
additive group of order q considered in this paper. 

This work began as an attempt to construct polar codes for 
the ordered symmetric channel, introduced in our earlier paper 
[8]. This channel provides an information-theoretic model 
related to the ordered distance on binary r-vectors, defined 
as follows; 



(7) 
channels Wii ,j = 1, . . . , A^. For the case q = 2 it is shown 

in [1] that as n increases, the channels Wfj become either 

almost perfect or almost completely noisy (polarize). In formal 

terms, for any £ > 

li,^ \{h^{+.-r:m'')&{e,l-e)}\ _^_ (4) 

In this paper we extend this result to the case q = 2'', r > 1. 
As shown in [1], after n steps of the transformation (2)-(3) 



dr{x, x') ^ max{j : x, ^ x^}, where x, x' e {0, 1}''. (1) t^e channels W^^^' : X ^ y^ y. X'~\l < i < N are given 



Below wtr{x) = dr{x,0) denotes the ordered weight of the 
symbol x. The ordered distance is an instance of a large class 
of metrics introduced in [9] following works of Niederreiter 
in numerical analysis [10]. It has subsequently appeared in a 
large number of works in algebraic combinatorics and coding 
theory; see e.g., [11] and references therein. We find it quite 
interesting that it independently arises in the study of polar 
codes on channels with input of size q — 2^'. Examples of q- 
ary polar codes for ordered symmetric channels can be easily 
constructed and analyzed. 

Last but not least, when this work was in its final stages, 
we became aware of the paper by Sahebi and Pradhan [12] 
who also observed the multilevel polarization phenomenon for 
g-ary channels. At the same time, [12] did not give a proof 
of polarization, which constitutes the main technical part of 
our work. The motivation of the approach of [12] relates to 
a detailed study of linear and group codes on q-ary channels, 
and is also different from our approach. 

In the next section we state and prove the main result, the 
convergence of the channels to one of the r + 1 extremal con- 
figurations, and deduce that polar codes achieve the symmetric 
capacity of the channel. Then we derive the rate of polarization 
and estimate the error probability of decoding, and give some 
examples. 

II. Polarization for q-ary channels 

We consider combining of the g-ary data under the action 
of the operator H2, where q = T' ,r > 2. Let W : X ^ 
y,\X\ = g be a discrete memoryless channel (DMC). The 
symmetric capacity of the channel W equals 

W{y\x) 
9 



^(^) = EE-^(yWi«g 



xexyey 



E.'^x^Wiy\x') 



where the base of the logarithm is 2. Define the combined 
channel W2 and the channels W^ and W^ by 

W2{yi,y2\ui,U2) = W{yi\ui +U2)W{y2\u2), 

W~{yi,y2\ui) ^ Y^ -W2iyi,y2\ui,U2), (2) 



U2£X ^ 

W'^{yi,y2,ui\u2) = -W2{yi,y2\ui,u2), 



(3) 



where ui,U2,yi,y2 are r-vectors and + is a modulo-g 
sum. This transformation can be applied recursively to the 
channels W~ , W^ resulting in four channels of the form 

M/''l''^6l,&2 e {+■-}■ After n steps we obtain iV == 2" 



by 



(5) 
where Gat = BH®"" and i? is a permutation matrix. Here 
we use the shorthand notation for sequences of symbols: for 
instance, yf = (yi, 2/2, ■ • • , yAf), etc. 

A. Notation 

For any pair of input symbols cc, x' e X, the Bhattacharyya 
distance between them is 



Z{W^^,^.^) = J2 VW{y\x)WiyW) 
yey 

where VF^a; -j/} is the channel obtained by restricting the input 
alphabet of W to the subset {x, x'} C X. 
Define the quantity Zy{W) for v e A" \ {0}: 

Zv{W) = ^ E ^(M^{.,.+«})- 

x£X 

Introduce the ith average Bhattacharyya distance of the chan- 
nel W by 

^'(^) = ^ E ^-(^) (6) 

veXi 
where i = 1, 2, • • • , r and Xi = {v E X : wtr(u) — i}. Then 



Z(W) : 



1-- Y. Z{W{,,,,y) 



2'- (2'- - 1) 



T^i:^'-'z.{w) 



(7) 



Recall the setting of [1] for the evolution of the channel 
parameters. On the set fl = { + > ^}* of semi-infinite binary 
sequences define a c-algebra J^ on 51 generated by the cylinder 
sets S{bi, . . . ,bn) = {we Q : uji = 61, . . . , cj„ = 6„} for all 
sequences (61, ... , 6„) G {+, — }" and for all n > 0. Consider 
the probability space {n,T,P), where P(5(fei, . . . , 6„)) = 
2^", n > 0. Define a filtration J^o C J^i C • • • C J^ where 
J-Q — {0, 0} and J>j, n > 1 is generated by the cylinder sets 
S(&i,...,6„),6,e {+,-}. 

Let Bi,i ~ 1,2,--- be i.i.d. {+,— }-valued random vari- 
ables with Pr(Si = +) = Pr(Bi = -) = 1/2. The random 
channel emerging at time n will be denoted by W^ , where 
B = (Bi, B2, ■ ■ ■ , S„). Thus, P{W^ ^ M^^'^) = 2-" for all 



i = 1,...,2". Let Wn = W, /„ 



I{W^), Z{^,^,}^. 



ZiW(i,,,y), ^.,„ = Z,„(W^^), and Z,,„ = Z,(VK^). These 
random variables are adapted to the above filtration (meaning 
that /„ etc. are measurable w.r.t. J-n for every n > 1). 

B. Channel polarization 

In this section we state a sequence of results that shows 
that g-ary polar codes based on the kernel H2 can be used to 
transmit reliably over the channel W for all rates R < I{W). 

Theorem 1: (a) Let n — > 00. The random variable /„ con- 
verges a.e. to a random variable loo with E{Ioo) = I{W). 
(b) For all i^ l,2,...,r 



where the variables Zi,ao take values and 1. With probability 
one the vector {Zi^ooji = ^, ■ ■ ■ ,r) takes one of the following 
values: 



(^1,00 


= 0, ^2,00 


= 0, .. 


• , Zr — l.oo 


= 0, 


^7-, 00 


= 0) 


(^1,00 


— 1, -^2,00 


= 0, .. 


■ , Zr — l,oo 


= 0, 


^7-, 00 


-0) 


(^1,00 


= 1, -^2,00 


= 1,.. 


■ 7 ^r — 1,00 


= 07 


Zr, 00 


= 0) 


(■^1,00 


= 1, ^2,00 


= 1,.. 


• 7 ^r — 1,00 


= 1, 


Zr. 00 


= 0) 


(■^1,00 


= 1, -^2,00 


= 1,.. 


• 7 -^r — 1,00 


== 1, 


■^r,oo 


= !)■ 



(8) 



Let us restate part (b) of this theorem for finite n. 
Proposition 1: Let e,5 > be fixed. For k = 0, 1, . . . , r 
define disjoint events 



'fc,n(£) — I W : (^1,717 ^2,717 • • ■ 7 ^r,7i) <= Tlk > 



B. 



where Uu = Tlu{e) ^ (llti^i) x (llU+i^o) md 
Do = [0,e), Di^{l- e, 1]. Then P(U',.^oBfe,„(£)) >l-5 
starting from some n = n{e,S). 

The proofs of these statements are given in a later part of this 
section. 

We need the following lemma. 

Lemma 1: For a DMC with g-ary input, I{W) and Z{W) 
are related by 



liW) > log 



r 

i{w) < Y, VT^z^iw)^. 



(9) 

(10) 



For r = 1 these inequalities are proved in [1]. For r > 1 
Eq. (9) is a restatement of [3, Prop. 3] using (7). The fact that 
(10) holds for all r > 1 is new, and is proved in the Appendix. 

Inequalities (9)-(10) imply that if (Zi,...,^^) € Tlk{e) 
then \I{W) - {r ~ k)\ < S where 6 > max{k^/e,{2''-'' - 
l)£loge). 

The following proposition is an immediate corollary of the 
above results. 

Proposition 2: (a) The random variable loo is supported on 
the set {0,1, . . .,r}. 



(b) For every < k < r and every 6 > there exists e > 
such that 

lim P({|J„ -{r- k)\ <S}A Bk.n{£)) = 0. 

(c)E{\{i:Z,.oo^O}\)^I{W). 

Proof: The first statement is obvious from (9)-(10). To 
prove the second statement we note that, with the appropriate 
choice of e 

{\In ~ [r - k)\ < 6} D Bk,ni^) 

for all n > 0. At the same time, P{{\In - (r - k)\ < 6} n 

Bk',n{e)) = for all k' ^ k, and P(U Bk.n{e)) -^ 1 for 
any e > 0. Together this implies (b). Finally, we have that 
E{Ioo) == I(W)- Then use (a) and (b) to claim that E{\{i : 
Z^,oo - 0}|) = ELo kPiloo = k) = I{W). m 

We can say a bit more about the nature of convergence 
established in this proposition. Let us fix fc € {0,1,..., r} 
and define the channel for the r — k rightmost bits of the 
transmitted symbol as follows: 

w^'-'Hy\u)^^ Y. Wiy\x), ue{o,ir-'= 

xGPtf-.x^.-^—u 

where x = (xi, X2, . . . , Xr)- 

Lemma 2: LetV : X ^ y bea DMC and let 6 > 0. Sup- 
pose that{Zi^n{V),Z2,niV), ... 7 Zr,„iV)) G 7^fc(e), for some 
< k < r. If e is sufficiently small, then I {V^'^^''^) > r-k — S. 
In particular, it suffices to take e < 2^^+^ / {2"^^^ — 1). 

Proof: We may assume that 1 < fc < r — 1. Let u E 
X'^^^,x = (a;i, . . . ,Xfc,u) € X,x' = [x'^, . . . ,x'f.,u) E X. 
Let V e {0, 1}'"~''\{0} and consider 



y \ X x' 

^ 4 E E E ^y{vV)y{vV' + «') 



y X x' 

^ ^E^'^^(^.^'+'''}) 

x.x' 

<2^e 



where v' = 0'^wiW2 . • . Wr-fc- The last inequality follows from 
the fact that Z^{V) < efox i = k + l, . . . ,r. Since Zj(F[''"'=1) 
is the average of the ■Z^(W^~ j1„i ) over all v with wtr(w) = i, 
^,^(y['-fel) < 2^e for all i = 1, . . . ,r - fc. Now the lemma 
follows from (9) in Lemma 1, ■ 

It turns out that the channels for individual bits converge to 
either perfect or fully noisy channels. If the channel for bit j 
is perfect then the channels for all bits i,r > i > j are perfect. 
If the channel for bit i is noisy then the channels for all bits 
j,l < j < i are noisy. The total number of near-perfect bits 
approaches I{W). This is made formal in the next proposition. 



Proposition 3: Let Qk = {i^ ■ (■Z^i.oo, -^2,00, ■ • • , -^r,: 
ikQT-kj^ fc = 0, 1, . . . , r. For every uj e Q.k 



lim |/„ 

n—^oo 



i{wi:-'^^)\ = Q. 



Proof: For every w G ilfe we have that /„(a;) ^ r — fc. 
Combining this with the previous lemma and Proposition 2(b), 
we conclude that for such lo also I{Wn ) —^ r ~ k. ■ 

The concluding claim of this section describes the channel 
polarization and establishes that the total number of bits sent 
over almost noiseless channels approaches NI{W) 

Theorem 2: For any DMC W : X ^ y the channels Wj^ 
polarize to one of the r + 1 extremal configurations. Namely, 
letV, 



WJ^'^ and 



T^k,N 



\{^e[N]■.\I{V.)-k\<S^\I{Vr)^k\<5)\ 



Wn 



N 



where J > 0, then Iiuym^oo T^kM 
0, 1, . . . , r. Consequently 



P{Iao = k) for all k = 



where if j E Ak,m k = 0, 1, . . . , r — 1, then the maximum 
is computed over the symbols x £ X with the fixed (known) 
values of the first k bits. 

The error probability of this decoding is estimated in 
Sect. II-E. 

D. Proof of Theorem 1 

Part (a) of Theorem 1 follows straightforwardly from [1], 
[3]. Namely, as shown in [1, Prop. 4], I{W+) + /(VK") = 
21 {W). We note that the proof in [1] uses only the fact that 
ui,U2 are recoverable from a;i,X2 which is true in our case. 
Hence the sequence Imn> 1 forms a bounded martingale. By 
Doob's theorem [13, p. 196], it converges a.e. in L^{n,J^,P) 
to a random variable loo with £'(/oo) = I{W)- 

To prove part (b) we show that each of the Zi^n^ converges 
a.s. to a (0, 1) Bernoulli random variable Zi.oo- This conver- 
gence occurs in a concerted way in that the limit rv.'s obey 
_ Zj.oo > Zi^oo a.e. if j < i. This is shown by observing that 
for any fixed i = 1, . . . , r and for all v £ Xi , the Z„ „(VF) 
converge to identical copies of a Bernoulli random variable. 



(0 



y^fcTTfc 

k=l 



liW). 



This theorem follows directly from Theorem 1 and Proposi- 
tions 2 and 3. Some examples of convergence to the extremal 
configurations described by this theorem are given in Sect. Ill 
below. 

C. Transmission with polar codes 

Let us describe a scheme of transmitting over the channel 
W with polar codes. Take e > and choose a sufficiently 
large n. Assume that the length of the code is iV = 2". 
Proposition 1 implies that set [N] , apart from a small subset, is 
partitioned into r + 1 subsets Ak.n such that for j E Ak.n the 
vector {Zi{wl^^),Z2{wl^^),..^,Zriwl^^)) € 7^fc(£). Each 
j G Ak,n refers to an r-bit symbol in which r — fc rightmost 
bits correspond to small values of Zi{Wj^ ). To transmit data 
over the channel, we write the data bits in these coordinates 
and encode them using the linear transformation Gn- 

More specifically, let us order the coordinates j E [N] 
by the increase of the quantity X][=i 2*~^.^i(Wj^ ) and use 
these numbers to locate the subsets Ak,n- We transmit data 
by encoding messages uf = (ui, . . . ,un) in which if 
j G Ak,n, fc = 0, . . . , r — 1 then the symbol Uj is taken from 
the subset of symbols of X with the first k symbols fixed 
and known to both the encoder and the decoder ([1] calls 
them frozen bits). In particular, the subset Ar.n is not used to 
transmit data. A polar codeword is computed as x^ = u^ Gn 
and sent over the channel. 

Decoding is performed using the "successive cancellation" 
procedure of [1] with the obvious constraints on the symbol 
values. Namely, for j = \, . . . ,N put 



-'J I 



j EAr 



1) Convergence o/Z„„,z; G X: In this section we shall 
prove that the Bhattacharyya parameters Z„ „ converge almost 
surely to Bernoulli random variables. The proof forms the 
main technical result of this paper and is accomplished in 
several steps. 

Lemma 3: Let 






Then 



r(:i)(n,N ,v-i 



Zi:-jHw+)=Z^Z~j\wr, j=0,...,r-L (11) 

Z^LiW-) < qZ^^UW) (12) 

ZL.'\W-) < ^Z^liW) + |ziV.')(W^) (13) 
and generally 

• • • + §z^:;~^^+^^ (w) + |z(;-^) (w). (14) 

Proof: In [3] it is shown that for all v G A'\{0} 

Z,{W+) = Z,{W)^ (15) 

Z,,{W-)<2Z,{W)+ Y. ZsiW)Z„+s{W). (16) 
sex\{o.-v} 

The first of these two equations implies (11). Now take v G 
Xr- Then in the sum on the right-hand side of (16) we have 
that either S G Xr 01 5 + v G Xr, and 

Zr{W-) < 2Zr{W) + (g - 2)Z(r),(VF), 

implying (12). Now take v G Xr-j,j > 1. The sum on 5 
in (16) contains q/2 terms with S G Xr, q/A terms with 
5 G Xr-i, and so on, before reaching Xr^j. Finally, let 



argmax, W^ (j/i , u^ \x), j G Uk<r-iAk,n g ^ UlZl^r^.M-v}. There are {q/2i) - 2 possibilities, and 



for each of them either f + 5 or (5 is in Xr-j- This implies 
(14) and therefore also (13). ■ 

In particular, take j = 0. Relations (11), (12) imply that 



ma>: 
7{r) 



y.' I _ ( y(r) ^2 jf r> 



(17) 

^max,ri+l ^ 9-^max,n if ^n+1 = ^- (18) 



Iterated random maps of this kind were studied in [14] which 
contains general results on their convergence and stationary 
distributions. We need more detailed information about this 
process, established in the following lemma. 

Lemma 4: Let t/„ , n > be a sequence of random vari- 
ables adapted to a filtration Fn with the following properties: 

(i) f/n e [0, 1] 

(ii)P((7„+i=f/2|J-„)>l/2 

(iii) Un+i < qUn for some q £ 1,+ . 

Then there are events Hq, Hi such that P(r2o U r2i) = 1 and 

C/„(w) ~> i ior u} & ili,i = 0, 1. 

Proof: (a) First let us rescale the process C/„ so that in 
the neighborhood of zero it has a drift to zero. Let /3 E (0, 1) 
be such that 



-,/3 



1 < 1/4. 



Let Xn = U^. Take t{uj) to be the first time when X„(cl') > 
1/2. Let Yn = ^min(?i,T)- On the event y„ > 1/2 we have 
Yn = Yn+1 or 

while on the event y„ < 1/2 we have 

E{Yn+i - y„|-F„) < i(y„2 - Yn) + i(<z^r„ - y„) 

o 

This implies that the sequence Yn,n> forms a supermartin- 
gale which is bounded between and 1. By the convergence 
theorem, F„ — > Y^o a.e. and in L^{Vt^F,P), where Y^o 
is a random variable supported on [0,1]. This implies that 
EYo > EYn i EYoo- Further, if Xq G [0,1/4] then (since 
EYo = EXo) 



P(Yao > 1/2) < 2EYa < 1/2. 



(19) 



(b) Now we shall prove that P{Yao e (5, i-5)) = for any 
6 > 0. From (ii) it follows that P(X„+i = Xl\Tn) > 1/2, 
which implies that 



P(r„+i = Y^\Tn) > 1/2 on y„ < 1/2 



(20) 



for all n > 0. Suppose that Kxj takes values in {6, 1/2 — 6) 
with probabiHty a > 0. Let A„ = {w : y„ G ((5, 1/2 - (5)}. 
Since Yn -^ Y^o a.e., the Egorov theorem implies that there 
is a subset of probability arbitrarily close to P(A„) which 
this convergence is uniform, and thus P{An) > a/2 for all 
sufficiently large n. Therefore 



Pi\Yr 



n+1 



Yn\ > 5^2) > P(y„+i = Y^, Y„ e {S, 1/2 - S)) 



a 
^4' 



the last step by (20). This however contradicts the almost sure 
convergence of F„ . 

(c) This implies that P{Yoo < 1/2) = P(r„ ^ 0) = 
P{Un -^ 0). From (19) 

P{Un ^ 0) > - provided that Uq < (t) " • (21) 

Moreover, if Uo < (1/2)1/'^ then either y„ ^ or Y,, > 1/2 
for some n. This translates to 

P((t/„ -^ 0) or {U„ > (1/2)^/*^ for some n)) = 1 (22) 

provided that Ua < {1/2)^/1^. 

(d) Let 6 > he such that q{^)^ < 1 — S (depending on 
q this may require taking a sufficiently small (3). Let L := 
[0,(i)^] and R := [1 - 6,1]. Observe that the process t/„ 
cannot move from L to R without visiting C :— ((i)^,l — J). 
Let (Ti be the first time when [/„ e C, let ?/i be the first time 
after cri when Un G LU R, let (T2 be the first time after r/i 
when Un G C, etc., cti < ?/i < (T2 < ?/2 < ■ • ■ • We shall prove 
that every sample path of the process eventually stays outside 
C, i.e., that for almost all w there exists k = k{oj) < oo such 
that CTfe(a;) = oo. 

Assume the contrary, i.e., limfe^oo P{<^k < oo) = a > 
(since P{(7k+i < oo) < P(crfe < cx)), this limit exists.) We 
have 

oo 

P{3k : o-fc = oo) > y^ P{aj ^ oo; C/^, £ L; cTj+i = oo) 



OO 

> a^P{Ur,^ e L;(7j+i = oo|crj ^ oo). 



(23) 



Consider the process t/,'j = Ua^+n on the event cTfe < oo 
(with the measure renormalized by P{crk < oo)). This 
process has the same properties (i)-(iii) as [/„. Let J = 
[log2(4 logi_5 1/4)1 J then x^' £ L for any x £ C. Therefore, 
P{Uj E L) > 2~'^ by property (ii). Now consider the process 
^'j+n '^^ '■1'^ event U'j E L. This process has properties (i)- 
(iii), so we can use (21) to conclude that for 

uniformly in k. But then the sum in (23) is equal to infinity, 
a contradiction. 

(e) The proof is completed by showing that the probability 
of Un staying in P^ = [0,1] \P without converging to zero 
is zero. We know that almost all trajectories stay outside C, 
so suppose that the process starts in (0, (1/2)^/*^). Then the 
probability that it enters i in a finite number of steps is 
uniformly bounded from below (this is shown similarly to 
(23)), so the probability that it does not go to L is zero. Next 
assume that the process starts in L, then by (22) it either goes 
to zero or enters C with probability one. Together with part 
(d) this implies that the process that starts in L converges to 
zero or one with probability one. ■ 

Lemma 5: Let V : X ^f y\>s?i channel. Let v, v' £ X\{0} 
be such that wtr{v) > wtr(w'). For any 6' > there exists 



(5 > such that Zy,{V) > I - 6' whenever Zy{V) > I- 6. 
In particular, we can take 5 — 6'q^^. 

Proof: If wtr(v) = 1 then w = 10 ... 0, so the statement 
is trivial. Let Zy{V) > 1 — (5, where wtr(u) = i > 2. Then for 
every pair x, x' = x + w we have Z{V{x,x'}) > 1 ^ £, where 
£ = q6. Consider the unit-length vectors z = {^V{y\x), y G 
y),z' = {^yV{y\x'),y e 3^), and let e{z,z') be the angle 
between them. We have cos{9{z, z')) = Z{V^x,x'}) > 1 ^ £j 
and so \\z- z'\\^ ^ 2 - 2cos(6'(z, z')) < 2£. 

Now take a pair of symbols a;i,a;2 = xi + v' where 
v' £ Xs,s < i. There exists a number t G Xr-t+s 
such that v' = tv. Define zi = (\/V(?/|xi), y G y) and 
Z2 = i^/V{y\x2),y € y). Let Wj = (y'Viylxi + jv),y e 
y),j = l,...,t— 1. From the triangle inequality 

\\Z1 - Z2\\ < \\zi -Wl\\ + \\wi -W2\\ H h \\wt-l - Z2II 

< tV2£ 

<qV2£. 
We obtain 

^(^{2;i,a;2}) = COS (61 (2:1,22)) = 1- V2|kl-^2|P 

> 1 - g^e 



Thus we obtain 



1 - q^6. 



Remark : We can prove the previous lemma in a different 
way by relating the Bhattacharyya distance to the ^1 -distance 
between V{y\xi) and V{y\x2) [15]. Then the estimate 6 = 
5'q^^ can be improved to (5 = 5' {2q)^'^ . 

Lemma 6: For all j = 1 , . . . , r 



ZU) 



Z(3) 



max, 00 ■ 



7U) 



is a Bernoulli random variable supported on 



where Zmax.oo 
{0,1}. 

Proof: For a given channel V denote 

Z^^iV) = max(Zil(y), Z(,^ii)(F), . . . , Z^JF)). 

Eq. (15) gives us that 

and (14) implies that 

zt-r\w-)<qzi-i^^\w). 

Hence by Lemma 4 the random variables z£ax','oo are well- 
defined and are Bernoulli 0-1 valued a.e. for all j = 
0,l,...,r-l. 



We need to prove the same for Z, 



ir-f) 



The proof is 



by induction on j. We just established the needed claim for 
■^max,n- For ease of understanding let us show that this implies 
the convergence of Zmax./i. Indeed, Zmax.oo is a Bernoulli 0-1 



valued random variable. But so is Z, 
are 



(r) 

inax.cxD 1 



SO the possibilities 



(Zi 



r-l,r] nrir) \ 

max, 00 ' max, 00/ 



(1,1) or (1,0) or (0,0) 



with probability one (note that (0,1) is ruled out by the 
definition of Zit^Tx^''^'). If zi^ix^oo = 1 then zi'^Z^L = 1 by 
Lemma 5 (this statement holds trajectory-wise). If on the other 
hand, the case that is realized is (1,0) then Zmax.oc = 1 by 
the definition of Z^ax '^ ■ Finally in the case (0, 0) we clearly 

(r — l) 

have that Zmax.oo = 0, both holding trajectory-wise. 

The general induction step is almost exactly the same. 
Assume that we have proved the required convergence for 

0, 1, . . . , J — 1. Assume that Z,„ax"'.'cx! = 0, then 



7(5 — i) ■ 

Zmax"* = 0. If on the other hand, Zmax'/oo = 1 then either 
one of ZmZx,oo ,i < j equals one, and then Zmax,L = 1 by 



Lemma 5, or Z^rax 00 
by definition of zln^^;^ . 



for all i < j, and then Zma.x oc ~ 1 



Now we are in a position to complete the proof of conver- 
gence. 

Lemma 7: Z^^n -^ -^ti.oo a.e., where Z„_oo is a (0, 1)- 
valued random variable whose distribution depends only on 
the ordered weight wtr(w). 

Proof: Let fi^' = {w : Z,*^lx,« ^ «}> where i = 0, 1 
and J = 1 , . . . , r^ where some of the events may be empty. 
For every w S $7? , j = 1, . . . , r we have that for any (5 > 
starting with some no the quantity Zmax,n > 1 — d. Thus, for 
n > no there exists v E Xj, possibly depending on n, such 
that Z„^„(w) > 1 — S. Then Lemma 5 implies that Z„'.„(aj) > 
1 — q'^S for all v' E Xj, so Zy,n{uj) ^^ 1. At the same time, 
if w e f^o^^ then Zy,n{Lu) -^ for all v e Xj. ■ 



2) Proof of Part (b) of Theorem 1: 

Lemma 8: For any i = 1 , . . . , r, the random variable 
Zj „ converges a.e. to a (0, l)-valued random variable Zi^oo- 
Moreover, Zi,oo > ^i-1.00 a.e. 

Proof: The first part follows because all the Zy,v E Xi 
converge to identical copies of the same random variable. 
Formally, Lemma 7 asserts that Zu „ -^ j for every v E Xi 
and every uj <E ^j ,j = 0, 1. Hence taking the limit n ^- 00 



(») 



i(0, 



(')x 



in (6) we see that Zj,„ -^ j on n)' where P{n};'un\') = 1. 
Let us prove the second part. Suppose that Z^ „ > 1 — e', 
then using (6) we see that Zyi^n > 1 — 2^~^e' for all v' G Xi. 
Lemma 5 implies that Z^, „ > 1 — 2'^''+*^^£' for any v £ 
X,v/tr{v) = i, and therefore Z,,„ > 1 - 2^'^+''~'^e' . Thus 
Zi.ni^) -^ 1 implies Zi^i{uj) — > 1 for all u E fl,i{i) and all 
i. The second claim of the lemma now follows because Zi^^o 
are 0-1 valued for alH. ■ 

We obtain that Zi, 00 is a (0, 1) random variable a.e. and 
for all i, and if Zi.oo = 1 then Zjoo = 1 for all I < j < i. 



Consider the events ^P 



(J) 



{lo : Zj 



«}:* = 0,1; J 



, r. We have 



,(1) 



,(2) 



^Y' D ^r 3 • • • D *i 



(r) 



*1^^ C * 



(2) 



c ••• c * 



(r) 



We need to prove that with probability one, the vector 
{Zioo,i = l,...,r) takes one of the values (8). With 
probability one Z,._oo = 1 or 0. If it is equal to 1 then 
necessarily Zr-i^oo = ■ ■ ■ = Zi^o = 1- Otherwise Z,.,oo = 0. 
In this case it is possible that Zr-i.oo = 1 (in which case 
Zr-2,oo = • • • = Zi,oo = 1) or Zr-i,oo = 0. Of coursc 
P(*[,''"^' U ^J''"^^) = 1, so in particular 

If ^r-i.oo = then the possibilities are ^^-2,00 = 1 or 0, up 
to another event of probability 0, and so on. Thus, the union 
of the disjoint events given by (8) holds with probability one. 
Theorem 1 is proved. ■ 

3) Proof of Prop. 1: The proof is analogous to the argument 
in the previous paragraph. The random variable Zr.n ^ ^r,oo 
a.e. . By the Egorov theorem, for any 7 > there are disjoint 



,M 



,M ,T,M 



subsets *J,^ C *o ,*1 



,('■) 



M, ,.T,W^ 



e *i^ with P(*^^u*i > 1-7 

(r) 

on which this convergence is uniform. Take n\ such that 

Z,.,„ > 1 - e/2'*''-! for every w € ^['''^ and n > n[''\ By 
Lemma 5 and (6) for every such w we have Z^ „ > 1 — e 
for all i = 1 , . . . , 1 
event Br,n- Otherwise let tIq ' be such that swp^ Zr^n < £ 

'^ (r) (r) — (r — 1) 

for ai e ^pQ and n > Uq . Consider the events ^n C 



(r) 

1; n > nl ' . This gives rise to the 



* 



('■"1) ,T,(''-l) 



* 



,('■-1) 



('-1), i,T,('-i)^ 



which Zr- 



l.n 



C *i ^^ withP(*^'"'^U*i > 1-7 on 

(r — l) 

> Z,._i 00 uniformly. Choose n\ such that 



> l-e/2 



4r-2 



,(^-1) 



for all n> n\ and all oj e ^j^' 



(r-l) 



For every such uj we have Z^ „ > 1 — e for alH = 1, . . . , r — 2; 

(r — l) 

n> n{ . Next, 

^(^o''\(*o'"'^ U (*('-''\*('')))) < 27. 

We continue in this manner until we construct all the r + 1 
events Bk^n- For this, n should be taken sufficiently large, 
n > maxfc max(nQ ,n\ ). By taking 7 = S/r we can ensure 
that P{UkBk.n > 1 — <5. This concludes the proof. 

Remark : For binary-input channels, the transmitted bits in 
the limit are transmitted either perfectly or carry no informa- 
tion about the message. §a§oglu et al. [3] observed that q-ary 
codes constructed using Arikan's kernel H2 share this property 
for transmitted symbols only if q is prime. Otherwise [3] notes 
the symbols can polarize to states that carry partial information 
about the transmission. In particular, they give an example of 
a quaternary-input channel W : {0,1,2,3} -^ {0,1} with 
VF(0|0) = W{0\2) = W{l\l) = VF(1|3) = 1. This channel 
has capacity 1 bit. Computing the channels W^ and W^ 
we find that they are equivalent to the original channel W . 
The conclusion reached in [3] was that there are nonbinary 
channels that do not polarize under the action of H2. 

We observe that the above channel corresponds to the 
extremal configuration 10 in (8) (the other two configurations 



arise with probability 0), and therefore has to be, and is, a 
stable point of the channel combining operation. It is possible 
to reach capacity by transmitting the least significant bit of 
every symbol. 

Paper [3] went on to show that for every n > 1 there 
exists a permutation 7r„ : Af — > A" such that the kernels 
H2{n) : {u, v) -^ {u + v, 7r„(w)) lead to channels that polarize 
to perfect or fully noisy. While the result of [3] holds for 
any q, in the case of q = 2'' this means that configurations 
00 ... and 11 ... 1 arise with probabihty 1-I{W) and I{W) 
respectively, while all the other configurations have probability 
zero. 

E. Rate of polarization and error probability of decoding 

The following theorem, due to Arikan and Telatar [16], is 
useful in quantifying the rate of convergence of the channels 
Wn to one of the extremal configurations (8). 

Theorem 3: [16] Suppose that a random process C/„, n > 
satisfies the conditions (i)-(iii) of Lemma 4 and that (iv), Un 
converges a.e. to a {0, l}-valued random variable Uoo with 
P{U^ = 0)=p. Then for any a G (0, 1/2) 

lim P{Un <2-2°")^p. 



(24) 



If condition (Hi) is replaced with (Hi') C/„ < Un+i and Uq > 0, 
then for any a > 1/2, 

lim P{Un < 2-2°") =0. 

Note that, as a consequence of Lemma 4, assumption (iv) in 
this theorem is superfluous in that it follows from (i)-(iii). 

Processes Zil'ix.n and Zi^^^;n\j = 0,...,r- 1 satisfy 
conditions (i)-(iii) of Lemma 4. Hence the above theorem 
gives the rate of convergence of each of them to zero. We 



argue that the convergence rate of Z, 



{r-j) 



,j> 1 to zero is 



also governed by Theorem 3. Indeed, let ft 



[r~j,r] 



■ [r-j,r] 



i},n 



(r-j) 



{uj : Z, 



max,n 



= {^ 



i}, i = 0, 1. Then 



n^;-'^ D n\^-''''^ and n'{-'^ = n^r''"^ (25) 



the last equality because by Lemma 5, Z, 



1 



Ar~3) 



1 implies 



1 on every trajectory. As a consequence of (25) 



we have that P{nl[ '\nl^ ^''"1) = 0. Hence F(zLx^L = 
0) = P{Zniax.'oD = 0). Denote this common value by pj. The 



random variable Zmnxfi satisfies a condition of the form (24) 
with p = pj. We obtain that for any a e (0, 1/2) 

lim P{Z^^i < 2-2°") ^ lim PiZ^-/l < 2-^°") =p,. 

(r—j) 

Of course if Z,nax,n is small then so is every Z„.„ for v € 
Xr-j. We conclude as follows. 

Proposition 4: For any a G (0, 1/2) and any v G Xj, j = 
12... r 

lim P(Z„.„<2-2°")^p, 

n— f oo 

This result enables us to estimate the probability of decoding 
error under successive cancellation decoding. To do this, we 
extend the argument of [1] to nonbinary alphabets. 



The following statement follows directly from the previ- 
ously established results, notably Proposition 2. 

Theorem 4: Let < a < 1/2. For any DMC W : X ^ y 
with I{W) > and any R < I{W) there exists a sequence 
of r-tuples of disjoint subsets Ao,n, ■ ■ ■ ,Ar-i,N of [N] such 
that J2k \^k,N\{r -k)>NR and Zy(WJ^^) < 2"^° for all 
i e Ak,N, all V e lj[=fc+i '^'i ^"'l all /c = 0, 1, — 

Let 



1. 






,JV 



,i-i 



7^ itj}- 



Then the block error probability of decoding is defined as 

P, = P{£)^P{ U 6,). 

The next theorem is the main result of this section. 

Theorem 5: Let < a < 1/2 and let < i? < I{W), 
where VF : A" ^ 3^ is a DMC. The best achievable 
error probability of block error under successive cancellation 
decoding at block length N ~ 2" and rate R satisfies 

Pe=0(2-^°). 

Proof: Let 

I^«(yf,urV,) < W^\y^,u\-'\u, + v)}. 

For a fixed value of aj^ = (oi, 02, . . . , a^.) G {0, 1}'^' let us 
define X{a'l) ~ {x e X : x^ = a^}. Notice that the decoder 
finds Ui, i e Ak.N by taking the maximum over the symbols 
X G X{ai). Then we obtain 

S, C (J £, ,„. 
Using (5), we obtain 

P{B,)< Y. ^(^^-) 

= E 5: I^W/..(,f|Ol..,J<,yf) 



DGA'(aJ)«f ,yf ^ 



< 



E 



E 



,7V 



M^iv(2/n<) 






E Ej^(n'k.«.+.}) 



Ko 



■N > 



= E ^"(^^ 

Thus the decoding error is bounded by 

^(^)^ E E ^-(^i 

ieAo,NU---UAr-i,N ■uGA'(aJ) 



iV ^ 



By Theorem 4, for any R < I{W) there exists a se- 
quence of r-tuples of disjoint subsets Aq.n, ■ ■ ■ ,Ar-i,N with 
J2k IA,7v|(r - fc) > NR such that 



■N° 






and thus we obtain that P{£) = 0(2"^ ). ■ 

III. Ordered Channels 

To compute a few examples, consider "ordered symmetric 
channels," called so because they provide a natural counterpart 
to the combinatorial definition of the ordered distance [8]. 
A simple example is given by the ordered erasure channel, 
defined as Wr -.F^^ {¥g U {?})^ where 



Wriy\x) 



£0, y = x, 

Ei, y = (??... Yxi+i ...Xr),l < i < r 



and Wr{y\x) = if y does not contain any erased coordinates 
and y y^ X. Its capacity equals r— Yll=i *£« ^^'^ i^ attained by 
sending r independent streams of data encoded for binary era- 



sure channels with erasure probabilities ^ 



j=t ^j 



Ej, I 



1,- 



Therefore, sending r independent polar codewords over the r 
bit channels, one can approach the capacity of the channel. 

Despite the fact that this example is trivial, it already shows 
the domination pattern observed in Theorem 1. Namely, it is 
easy to prove directly that Zj,oq > 2^i.oo a.s. for all i > j, 
thereby establishing the result of Lemma 8. For that it suffices 
to observe that the erasure in higher-numbered bits implies that 
all the lower-numbered bits are erased with probability 1 . We 
include two examples. In Fig. 1, r = 2, and Eq = 0.5, £1 = 
0.4, £2 = 0.1. In Fig. 2, r = 9 and £, = 0.1, i = 0, 1, . . . , 9. 
Note that the proportion of the channels with capacity i = 
0, 1, . . . , r bits converges to Ei. 

Another example is given by the ordered symmetric channel 
[8] which is a DMC W : {0, 1}" -^ {0, 1}" defined by the 
matrix VF(y|a;) where 



W{y\x) 



-0-1), 



(26) 



for all pairs y, x such that dr{x, y) ~ j, j ~ 1, . . . ,r, and 
where W^(a;|a;) = Eq for all x ^ X. The ordered symmetric 
channel models transmission over r parallel links such that, 
if in a given time slot a bit is received incorrectly, the bits 
with indices lower than it are equiprobable. This system was 
proposed in [19] as an abstraction of transmission in wireless 
fading environment. The capacity of the channel equals 



I{W) 



£0 log^ £0 



E^ 

4=1 



;log„ 



K?-!) 



By Theorem 1 g-ary polar codes, q ~ 2'' can be used to 
transmit at rates close to capacity on this channel; moreover, 
the domination pattern that emerges, exactly matches the 
fading nature of the bundle of r parallel channels, achieving 
the capacity of the system discussed above. 




2048 4096 6144 8192 10240 12288 14336 16384 
Channel Index 

Fig. 1. 3-level polarization on the ordered erasure cliannel W : X —> 
y,X = {00,01,10,11} with transition probabilities eo := Vy(00|00) = 
0.5, £1 := W{'?X2\xiX2) = 0.4,e2 := W{?'?\xi,X2) = 0.1, for all 
2^1, a::2 G {0, 1}. In this example it is easy to see that P{Ioo = i) = £i,i = 
0,1,2. 




Fig. 2. 10-level polarization on the ordered erasure channel W : {0, 1}® 
y with transition probabilities £i = 0.1, i = 0, 1, . . . , 9. 



IV. Conclusion 

The result of this paper offers more detailed information 
about polarization on g-ary channels, g = 2''. The multilevel 
polarization adds flexibility to the design of the transmission 
scheme in that we can adjust the number of symbols that carry 
a given number of bits to a specified proportion of the overall 
transmission as long as the total number of bits is fixed. This 
could be useful in the design of signal constellations for coded 
modulation, including BICM [17], [18] as well as in other 
communication problems that can benefit from nonuniform 
symbol sets. 
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Makowski, and Himanshu Tyagi (UMD) for useful discussions 
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grants CCF0916919, CCF0830699, and DMSl 117852. 

Appendix 

The proof of (10) : We shall break the expression for I{W) 
into a sum of symmetric capacities of B-DMCs. 



Let z ~ (21, ... , Zk) be an fc-tuple of symbols from X. 
Define the probability distribution P(?/|z) = -^ Si=i ^(j/ki)- 
Define a B-DMC W^J^^ ^(^^^ : X'' ^ y with inputs z(*) € 
X^, where the transition z'*) — > y is given by P(y|z^'-'), i = 
1,2. 

Lemma 9: The Bhattacharyya parameter of the chan- 
nel W^j^(\)_^(2)}, where z^^) = (xi, . . . ,0;^), z^^) = 
(xfc_|_i, . . . , X2k), can be lower bounded by 

for any / which is a one-to-one mapping from the set 
{l,2,...,fc} to {fc + l,...,2fc}. 

Proof: It suffices to prove the above inequality for some 
one-to-one mapping. Let f{i) = k + i. For brevity denote 



"«,!/ 



W{y\xi). We have 



V \ 



4=1 



E^'^^y E 



2fe 



i=k+l 



while the right hand side of (27) is 

- Y, ^(w^fe, =.,(„}) = -^ E E v^^^^:?^^^^- 

j=l y i=l 

The Cauchy-Schwartz inequality gives us 

E^'^2')( E "^'^y) ^ (E^ 

i=l i=k+l i=l 

hence the lemma. 



'^i^y'^k-\-i.y 



Let us introduce some notation. Given z 



[zi, 



,Zk) € 



X'', let z X = (zi © X, . . . , Zfe © x) where © is a bit- 
wise modulo-2 summation. In the next lemma we consider 



B-DMCs W^'^l, „, : X'' 



y,k 



-yni—l 



(1) 



,m = 1, . . . 

_ . (1) 

Xi , Z2 



with inputs of special form. Namely, z^ 

(xi,xi©X2); 4 = (xi,Xi©X2,xi©X3,xi©X2©X3), and 
generally, Zm is formed of xi plus all the possible sums of 
the vectors X2, . . . , Xm with 0—1 coefficients, including the 
empty one. Finally, Zm = zL © Xm+i. 

For m = 0,l,...,r — 1 introduce the set A — 
A{xi, . . . ,Xm+i) C Af'"+^ as follows: 

^ = {(xi, . . . ,x™+i) e X"'+^\xi eX;x2e X\{0}; 

Xj 7^ 2, o,iXi, for all choices of a; e {0, 1}, j — 3, . . . ,m + 1> 



i=2 

We need the following technical lemma. 
Lemma 10: 

m) - E ( ^ n FzW) s ^(<k.-)) 

(28) 
where the number fc, the vectors z„i,Zm, and the set 
A{xi, . . . , Xm+i) are defined before the lemma. 



Proof: First we express the capacity of W as the sum of 
symmetric capacities of B-DMCs. 



1 1 



I{W) 
_ 1 
~ 2 



+ 2 ■ 2 (^(2/ki © 2:3) + W{y\xi ®X2® 2:3)) 
•log^ 



\{W{y\xi ® xg) + W{y\xi ®X2® x^)) 



1 ££"--(» Wiog^ 



+ M^(y|a;i®a::2)log 



W{y\x^ 

P{v) 

W{y\xi®X2y 



2'-(2'~ - 1) 

•EE fV(yki)iogT 



y a;i,a;2 
2:25^0 






B 



where B = i(W^(y|xi) + W{y\xi ® X2) + W{y\xi ® 2:3) + 
M^(2/|a;i©X2®a;3)). 

By now it is clear what we want to accomplish. Let us 
again take the sum on y inside. Recalling the definition of the 
channel W^C^') before Lemma 9, we obtain 



T2 = 



1 



2"^ -2 



E /<K4->)+^ 



.4(a:i,a;2,a:3) 



.(2) 



+ -M^(y|a:i©a;2)log- 



^{W{y\xi) + W{y\xi®X2)) 
W{y\xi®X2) 



here I{W [^ (2) ) is the symmetric capacity of the B- 

1-^2 '-^2 i 



(1) 



DMC Vl^; n ,2), with z\' = {a:i,a;i e a::2} and 



.(2) _ 



{xi ® xsjXi ® X2 ® xz}, and T3 is the term remaining in 



{W{y\xi) + W{y\xi © a;2)) the expression for T2 upon isolating this capacity: 



1 



+ -{W{y\xi) + W{y\xi®X2)) 
■ log- — 



\{W{y\xi) + W{y\xi®X2)) 



P{y) 



^3 = E E B\o^ 



y A{xi,x2:Xs) 



B 



rt2r _ lU E ^(^{2:i,a;i®2;2}) +^2| 



2'- (2 



a:i,X2 
2;2#0 



where 



^2 = E E J(w"(yl^i) + ^(yl^i ® ^2)) 



y X-1.X2 
X2^0 



log 



^(W^(j/|a;i) + W^(j/|a:i®a;2)) 
^(2/) 



} 



Now repeat the above trick for T3, namely, average over all 
the linear combinations that this time include the vector xa 
and isolate the symmetric capacity of the channel W'^^^ that 
arises. Proceeding in this manner, we obtain 

I{W) = ^^ _ Y^ /(W^{xl,:riex2}) 

^ ' X1.X2 

2:2/0 



A{xi,X2,X3) 



Observe that the condition 2:2 7^ is needed in order to obtain 
the expression for /(VKf^j^j;^®^^}). 

We will apply the same technique repeatedly. In the next 
step we add another sum, this time on X3 which has to satisfy 
the conditions 2:3 7^ 0,X3 7^ X2. We have 



2'-(2'-- l)(2'--2) 



ti^fl^F^) E ^('"■S..™,) 



2r 11 2'' -2J-V ^^ ' {z\^>.z\i']' 

m=l ^ j = l ' A{Xi,...,Xrr^+l) 

where the notation Zm , Zm ,A{xi, . . . , Xm+i) is introduced 



1 n where the 

'^2=2^ 2(2r „ 2) ^ ( 2 ^^^^'^^^ "^ W^(2/|a;i © 2:2)) before the statement of lemma. ■ 

y Aixi,x2,x3) ^g continue with the proof of inequahty (10). The term 



•log 
1 



k{W{y\xi) + W{y\xi®X2)) 



P{y) 



+ ■^{W{y\xi © 0:3) + W{y\xi © ^2 © 2:3)) 
•log .2^ 



\{W{y\xi © a;3) + W{y\xi ®X2® x^)) 



P{v) 



2'' -2 
•log 2 



with TO = 1 in (28) equals 

or (or _ T\ / y ^[''^{xi,xi®X2}) 
a:2#0 

- 2'-f2'" - 1) ^ V "'^ ^ ^(^{a;i,a;iea;2})^ 



'■(2^ _ 1) Z. 
\ ' 2:1, 2:2 

a;2#0 



^E E (^•^W2/l^i) + W^(2/l^i®^2)) 

i(W^(y|xi) + iy(2/|:riffiX2)) , „,,_, B 2'-(2- - 1) £-^ ^f^_ V ^ {x..x,e.2}) 



B 



B\og 



P{y) 



,X2 
Wtr.{x2} = d 



2 


^(2'- - 


- 1) .tt 


\ 


1- 


/I \2 
[or+d-1 1^ Z{W{^,,^,(s^^^})\ 

\ XX.X2 / 
Wtr(x2)=d 



^ sT? X] I{W{XUXI®X2}) 



8-7 



-4(2:1,3:2) 



fli E /<K4->.' 



^(3:1,2:2,3:3) 



1 



2'" - 



^t^'-A^ 



^(3:1,3:2, 3:3, £C4) 



d=l 



where the first inequahty is from the relation between the 
symmetric capacity and the Bhattacharyya parameter of B- 
DMCs [1], and the second inequality follows from the fact 
that the function -\/l — x^ is concave for < .t < 1. 

The terms with m > 2 in (28) will be estimated using 
Lemma 9. We will choose the map / so that the r-vector 

«(/) = (^^^)).©(^^'^)/(s) 

does not depend on s. For instance, one such map is given 
in Lemma 9. Moreover, out of all such mappings we take the 
one for which wtr(a(/)) is the smallest. Then the second term 
becomes 



<-[Jl-Zl + 2Jl-Zl + AJl-Zl 



1 
7~6 
1 



-- ( 12^1 - Zl + ISJI - Zl + I2J1 Zl 



7-6-4 



96^/1 - Zl + 48 Jl - Zl + 24^1 - Z| 



Zl 



2r(-2r „l)(2r _2) 



-4(3:1,3:2,3:3) 



< 



1 



E 



2'-(2'--l)(2'--2) ^ V^ ^(^ij\4')})' 
-4(2:1,3:2,3:3) 



.(2) 



< 



1 



2'' (2''- l)(2'--2) 



E 



A{x\,xi,x-^') 



1- 



D2 



< 



2r(2'-- l)(2'--2) ^ Z^ 

^ '^ ' d=i ^(2:1,2:2,2:3) 

wt,,(2:3)=d 



D2 



2r(2'-- l)(2'--2) 



d=\ 



i 



1 



2''+i • ad 



E ^ 

^(3:1,3:2,3:3) y 

wtr(3:3)=rf 



< 



EWi-^^' 



(2'^-l)(2'^-2)j^ 
where 

D = ,Z'(W{a;^_a;^0a;3}) + ^ ( ^{2: 1 ©2:2 , 3: 1 ©3:2 ©2:3 } ) 

ai = 2'^-^ ■ (2''+i - 3 • 2'^-^ - 1) 

which is the number of terms with wt,.(.T3) = d,Xi =0 
under the given condition. Repeating this process, we obtain 
the claimed result. The full calculation is cumbersome, but its 
essence is captured in the example for r = 3 which we write 
out in full: 

"1=1 ^ i=l ^ -4(2:1,. ..,2:„,+i) 



= ^i-zt + ^ji-z.i + ,ji 

This completes the proof of (10). 
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